Demo. Data Evaluation

An example of SSH reconstruction has been produced in the "example_data_oi.ipynb" notebook. Here, an example of data evaluation is proposed. The notebook is structured as follow: 1) reading of reference and reconstructed SSH fields, 2) make field on similar spatio-temporal grid and 3) comparison of reconstrusted and refernce SSH fields (statistical/spectral comparison)

In [1]:
import xarray as xr
import cftime
import geoviews as gv
import matplotlib.pylab as plt
from matplotlib.ticker import ScalarFormatter

import numpy as np
from datetime import datetime, timedelta
import numpy
import pyinterp
import holoviews as hv
import xrft

from dask.diagnostics import ProgressBar

gv.extension('bokeh')
from matplotlib import rcParams
rcParams['font.sans-serif'] = 'TeX Gyre Heros'

1) Read reference & reconstructed SSH fields

Read reference SSH field
In [2]:
dc_ref = xr.open_mfdataset('./dc_ref/*.nc', combine='nested', concat_dim='time')
dc_ref
Out[2]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • lat: 600
    • lon: 600
    • time: 8760
    • lat
      (lat)
      float64
      33.02 33.03 33.05 ... 42.98 43.0
      valid_min :
      26.983333333333334
      valid_max :
      66.99999999999773
      array([33.016667, 33.033333, 33.05    , ..., 42.966667, 42.983333, 43.      ])
    • lon
      (lon)
      float64
      -64.98 -64.97 ... -55.02 -55.0
      valid_min :
      -80.01666666666667
      valid_max :
      8.016666666661663
      array([-64.983333, -64.966667, -64.95    , ..., -55.033333, -55.016667,
             -55.      ])
    • time
      (time)
      datetime64[ns]
      2012-10-01T00:30:00 ... 2013-09-30T23:30:00
      axis :
      T
      standard_name :
      time
      long_name :
      Time axis
      time_origin :
      1958-01-01 00:00:00
      bounds :
      time_counter_bounds
      array(['2012-10-01T00:30:00.000000000', '2012-10-01T01:30:00.000000000',
             '2012-10-01T02:30:00.000000000', ..., '2013-09-30T21:30:00.000000000',
             '2013-09-30T22:30:00.000000000', '2013-09-30T23:30:00.000000000'],
            dtype='datetime64[ns]')
    • sossheig
      (time, lat, lon)
      float32
      dask.array<chunksize=(24, 600, 600), meta=np.ndarray>
      standard_name :
      sea_surface_height_above_geoid
      long_name :
      seasurfaceheight
      units :
      m
      online_operation :
      average
      interval_operation :
      40 s
      interval_write :
      1 h
      cell_methods :
      time: mean (interval: 40 s)
      actual_range :
      [-0.7419584 1.1676793]
      Array Chunk
      Bytes 12.61 GB 34.56 MB
      Shape (8760, 600, 600) (24, 600, 600)
      Count 1095 Tasks 365 Chunks
      Type float32 numpy.ndarray
      600 600 8760
  • Info :
    Horizontal grid read in regulargrid_NATL60.nc / Source field read in /home/cubelmann/Data/Data_natl60_hourly/NATL60-CJM165_y2012m10d01.1h_SSH.nc / Interpolation method: bilin
    About :
    Created by SOSIE interpolation environement => https://github.com/brodeau/sosie/
Read reconstructed SSH field
In [3]:
dc_reconstruction = xr.open_mfdataset('ssh_rec.nc', combine='by_coords')
dc_reconstruction
Out[3]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • lat: 52
    • lon: 51
    • time: 365
    • time
      (time)
      datetime64[ns]
      2012-10-01 ... 2013-09-30
      array(['2012-10-01T00:00:00.000000000', '2012-10-02T00:00:00.000000000',
             '2012-10-03T00:00:00.000000000', ..., '2013-09-28T00:00:00.000000000',
             '2013-09-29T00:00:00.000000000', '2013-09-30T00:00:00.000000000'],
            dtype='datetime64[ns]')
    • lon
      (lon)
      float64
      -65.0 -64.8 -64.6 ... -55.2 -55.0
      array([-65. , -64.8, -64.6, -64.4, -64.2, -64. , -63.8, -63.6, -63.4, -63.2,
             -63. , -62.8, -62.6, -62.4, -62.2, -62. , -61.8, -61.6, -61.4, -61.2,
             -61. , -60.8, -60.6, -60.4, -60.2, -60. , -59.8, -59.6, -59.4, -59.2,
             -59. , -58.8, -58.6, -58.4, -58.2, -58. , -57.8, -57.6, -57.4, -57.2,
             -57. , -56.8, -56.6, -56.4, -56.2, -56. , -55.8, -55.6, -55.4, -55.2,
             -55. ])
    • lat
      (lat)
      float64
      33.0 33.2 33.4 ... 42.8 43.0 43.2
      array([33. , 33.2, 33.4, 33.6, 33.8, 34. , 34.2, 34.4, 34.6, 34.8, 35. , 35.2,
             35.4, 35.6, 35.8, 36. , 36.2, 36.4, 36.6, 36.8, 37. , 37.2, 37.4, 37.6,
             37.8, 38. , 38.2, 38.4, 38.6, 38.8, 39. , 39.2, 39.4, 39.6, 39.8, 40. ,
             40.2, 40.4, 40.6, 40.8, 41. , 41.2, 41.4, 41.6, 41.8, 42. , 42.2, 42.4,
             42.6, 42.8, 43. , 43.2])
    • ssh_rec
      (time, lat, lon)
      float64
      dask.array<chunksize=(365, 52, 51), meta=np.ndarray>
      Array Chunk
      Bytes 7.74 MB 7.74 MB
      Shape (365, 52, 51) (365, 52, 51)
      Count 2 Tasks 1 Chunks
      Type float64 numpy.ndarray
      51 52 365
    • nobs
      (time)
      float64
      dask.array<chunksize=(365,), meta=np.ndarray>
      Array Chunk
      Bytes 2.92 kB 2.92 kB
      Shape (365,) (365,)
      Count 2 Tasks 1 Chunks
      Type float64 numpy.ndarray
      365 1

2) Regriding: make reconstructed & reference SSH fields onto the same grid

The reconstructed SSH is not on the same spatio-temporal grid as the refrence field,

A regridding on the similar spato-temporal grid s required for the comparison. Here we have mde the choice to "temporally degrade" the reference SSH field into daily mean sample and interpolate the reconstructed field onto this new reference spatio-temporal grid. For this, the pyinterp package is used

In [4]:
dc_ref_sample = dc_ref.resample(time='1D').mean()
dc_ref_sample
Out[4]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • lat: 600
    • lon: 600
    • time: 365
    • time
      (time)
      datetime64[ns]
      2012-10-01 ... 2013-09-30
      array(['2012-10-01T00:00:00.000000000', '2012-10-02T00:00:00.000000000',
             '2012-10-03T00:00:00.000000000', ..., '2013-09-28T00:00:00.000000000',
             '2013-09-29T00:00:00.000000000', '2013-09-30T00:00:00.000000000'],
            dtype='datetime64[ns]')
    • lat
      (lat)
      float64
      33.02 33.03 33.05 ... 42.98 43.0
      valid_min :
      26.983333333333334
      valid_max :
      66.99999999999773
      array([33.016667, 33.033333, 33.05    , ..., 42.966667, 42.983333, 43.      ])
    • lon
      (lon)
      float64
      -64.98 -64.97 ... -55.02 -55.0
      valid_min :
      -80.01666666666667
      valid_max :
      8.016666666661663
      array([-64.983333, -64.966667, -64.95    , ..., -55.033333, -55.016667,
             -55.      ])
    • sossheig
      (time, lat, lon)
      float32
      dask.array<chunksize=(1, 600, 600), meta=np.ndarray>
      Array Chunk
      Bytes 525.60 MB 1.44 MB
      Shape (365, 600, 600) (1, 600, 600)
      Count 2920 Tasks 365 Chunks
      Type float32 numpy.ndarray
      600 600 365
Define reconstruction grid (source grid to interpolate)
In [5]:
x_axis = pyinterp.Axis(dc_reconstruction["lon"][:], is_circle=False)
y_axis = pyinterp.Axis(dc_reconstruction["lat"][:])
z_axis = pyinterp.TemporalAxis(dc_reconstruction["time"][:])
In [6]:
ssh_rec = dc_reconstruction["ssh_rec"][:].T
In [7]:
grid = pyinterp.Grid3D(x_axis, y_axis, z_axis, ssh_rec.data)
Define reference grid (target grid where to interpolate)
In [8]:
mx, my, mz = np.meshgrid(
    dc_ref_sample.lon.values,
    dc_ref_sample.lat.values,
    z_axis.safe_cast(dc_ref_sample.time.values),
    indexing="ij")
Interpolate...
In [9]:
ssh_rec = pyinterp.trivariate(grid,
                          mx.flatten(),
                          my.flatten(),
                          mz.flatten(),
                          bounds_error=True).reshape(mx.shape).T
Save the SSH reconstruction interpolated onto the reference spatio-temporal grid into dc_reconstruction_interp dataset
In [10]:
dc_reconstruction_interp = xr.Dataset({'sossheig' : (('time', 'lat', 'lon'), ssh_rec)},
                               coords={'time': dc_ref_sample.time.values,
                                       'lon': dc_ref_sample.lon.values, 
                                       'lat': dc_ref_sample.lat.values, 
                                       })  
dc_reconstruction_interp  
Out[10]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • lat: 600
    • lon: 600
    • time: 365
    • time
      (time)
      datetime64[ns]
      2012-10-01 ... 2013-09-30
      array(['2012-10-01T00:00:00.000000000', '2012-10-02T00:00:00.000000000',
             '2012-10-03T00:00:00.000000000', ..., '2013-09-28T00:00:00.000000000',
             '2013-09-29T00:00:00.000000000', '2013-09-30T00:00:00.000000000'],
            dtype='datetime64[ns]')
    • lon
      (lon)
      float64
      -64.98 -64.97 ... -55.02 -55.0
      array([-64.983333, -64.966667, -64.95    , ..., -55.033333, -55.016667,
             -55.      ])
    • lat
      (lat)
      float64
      33.02 33.03 33.05 ... 42.98 43.0
      array([33.016667, 33.033333, 33.05    , ..., 42.966667, 42.983333, 43.      ])
    • sossheig
      (time, lat, lon)
      float64
      0.1958 0.1962 ... -0.03628 -0.03315
      array([[[ 0.19579816,  0.19621586,  0.19663355, ...,  0.48705379,
                0.48682564,  0.48659749],
              [ 0.19346891,  0.1938624 ,  0.19425588, ...,  0.48706254,
                0.48683611,  0.48660968],
              [ 0.19113967,  0.19150894,  0.19187821, ...,  0.48707129,
                0.48684658,  0.48662188],
              ...,
              [-0.02718423, -0.02634752, -0.02551082, ..., -0.02158707,
               -0.02115059, -0.02071412],
              [-0.02656861, -0.02574062, -0.02491263, ..., -0.02382176,
               -0.02335083, -0.0228799 ],
              [-0.025953  , -0.02513372, -0.02431444, ..., -0.02605646,
               -0.02555106, -0.02504567]],
      
             [[ 0.25865868,  0.25929682,  0.25993496, ...,  0.52860388,
                0.52846239,  0.5283209 ],
              [ 0.25558952,  0.25619674,  0.25680396, ...,  0.52899543,
                0.52885229,  0.52870916],
              [ 0.25252035,  0.25309665,  0.25367295, ...,  0.52938698,
                0.52924219,  0.52909741],
              ...,
              [-0.03155639, -0.030675  , -0.02979361, ..., -0.00278255,
               -0.00226846, -0.00175437],
              [-0.03084611, -0.0299712 , -0.0290963 , ..., -0.00487206,
               -0.00432469, -0.00377731],
              [-0.03013583, -0.02926741, -0.02839899, ..., -0.00696158,
               -0.00638091, -0.00580025]],
      
             [[ 0.32771804,  0.32863111,  0.32954417, ...,  0.55258112,
                0.55249041,  0.5523997 ],
              [ 0.32383822,  0.32471339,  0.32558856, ...,  0.55337278,
                0.55327817,  0.55318356],
              [ 0.31995841,  0.32079568,  0.32163295, ...,  0.55416443,
                0.55406593,  0.55396742],
              ...,
              [-0.02780847, -0.02677515, -0.02574184, ...,  0.00703393,
                0.0072875 ,  0.00754107],
              [-0.02713031, -0.02610043, -0.02507056, ...,  0.00505271,
                0.00533111,  0.00560951],
              [-0.02645215, -0.02542571, -0.02439928, ...,  0.00307149,
                0.00337472,  0.00367796]],
      
             ...,
      
             [[ 0.52335297,  0.52181042,  0.52026787, ...,  0.46744684,
                0.46448642,  0.461526  ],
              [ 0.52537169,  0.52383558,  0.52229947, ...,  0.46763478,
                0.46472134,  0.4618079 ],
              [ 0.52739042,  0.52586075,  0.52433108, ...,  0.46782273,
                0.46495627,  0.4620898 ],
              ...,
              [-0.06708476, -0.06550783, -0.06393091, ..., -0.06258975,
               -0.05998558, -0.05738141],
              [-0.06656605, -0.06485922, -0.06315239, ..., -0.05890662,
               -0.05640678, -0.05390694],
              [-0.06604735, -0.06421061, -0.06237388, ..., -0.0552235 ,
               -0.05282798, -0.05043246]],
      
             [[ 0.51622231,  0.51478921,  0.51335612, ...,  0.42438398,
                0.42147255,  0.41856112],
              [ 0.51861917,  0.51718712,  0.51575506, ...,  0.42463323,
                0.42176404,  0.41889485],
              [ 0.52101604,  0.51958502,  0.51815401, ...,  0.42488248,
                0.42205553,  0.41922859],
              ...,
              [-0.06211971, -0.06070588, -0.05929205, ..., -0.05738657,
               -0.0543261 , -0.05126564],
              [-0.0614084 , -0.05987429, -0.05834018, ..., -0.05323061,
               -0.05030036, -0.04737012],
              [-0.06069709, -0.0590427 , -0.05738831, ..., -0.04907465,
               -0.04627463, -0.04347461]],
      
             [[ 0.49670214,  0.49535057,  0.493999  , ...,  0.3846522 ,
                0.38218858,  0.37972496],
              [ 0.49942531,  0.49806994,  0.49671457, ...,  0.385316  ,
                0.38288991,  0.38046381],
              [ 0.50214848,  0.5007893 ,  0.49943013, ...,  0.3859798 ,
                0.38359123,  0.38120267],
              ...,
              [-0.05542468, -0.05426124, -0.0530978 , ..., -0.04861483,
               -0.04520327, -0.04179172],
              [-0.05451812, -0.05325023, -0.05198235, ..., -0.0440111 ,
               -0.04073974, -0.03746839],
              [-0.05361155, -0.05223923, -0.0508669 , ..., -0.03940738,
               -0.03627621, -0.03314505]]])

3) Comparison between reference and reconstructed SSH fields

Plot example...
In [11]:
time_selection = '2013-01-01T23:00:00'
plt.figure(figsize=(15, 5))
plt.subplot(121)
dc_reconstruction_interp.sossheig.sel(time=time_selection, method='nearest').plot(vmin=-0.2, vmax=1, cmap='gist_stern')
plt.title('Reconstruction')
plt.subplot(122)
dc_ref_sample.sossheig.sel(time=time_selection, method='nearest').plot(vmin=-0.2, vmax=1, cmap='gist_stern')
plt.title('Reference')
# plt.savefig('example_ssh.png')
Out[11]:
Text(0.5, 1.0, 'Reference')
In [20]:
# SSH reconstruction resampled, otherwise too heavy for display
dataset2 = gv.Dataset(dc_reconstruction_interp.coarsen({'lon':6, 'lat': 6, 'time':6}, boundary="trim").mean(), ['lon', 'lat', 'time'], 'sossheig')
images2 = dataset2.to(gv.Image).redim(sossheig=dict(range=(-0.2, 1.)))

# SSH reference resampled, otherwise too heavy for display
dataset3 = gv.Dataset(dc_ref_sample.coarsen({'lon':6, 'lat': 6, 'time':6}, boundary="trim").mean(), ['lon', 'lat', 'time'], 'sossheig')
images3 = dataset3.to(gv.Image).redim(sossheig=dict(range=(-0.2, 1.)))

# delta SSH
delta_ssh = (dc_reconstruction_interp - dc_ref_sample).coarsen({'lon':6, 'lat': 6, 'time':6}, boundary="trim").mean()
delta_ssh = delta_ssh.rename({'sossheig': 'delta_ssh'})

dataset1 = gv.Dataset(delta_ssh, ['lon', 'lat', 'time'], 'delta_ssh')
images1 = dataset1.to(gv.Image).redim(delta_ssh=dict(range=(-0.3, 0.3)))
In [21]:
layout = hv.Layout(images3.opts(width=400, height=300, cmap='gist_stern', colorbar=True, title='SSH reference', toolbar='above') + images2.opts(width=400, height=300, cmap='gist_stern', colorbar=True, title='SSH reconstruction', toolbar='above') + images1.opts(width=400, height=300, cmap='coolwarm', colorbar=True, title='SSH reconstruction - reference')).cols(2)
In [22]:
layout
Out[22]:

4) Evaluation of the reconstructed SSH fields

Compute RMSE-based score
In [15]:
rmse_t = (((dc_reconstruction_interp.sossheig - dc_ref_sample.sossheig)**2).mean(dim=('lon', 'lat')))**0.5/(((dc_ref_sample.sossheig)**2).mean(dim=('lon', 'lat')))**0.5
In [29]:
plt.figure(figsize=(15, 5))
(1.0 - rmse_t).plot(color='r', lw=3)
plt.ylabel('RMSEs(t)', color='r', fontweight='bold', fontsize=20)
plt.xlabel('Time', fontweight='bold', fontsize=20)
plt.ylim(0, 1)
plt.xticks(fontsize=18)
plt.yticks(fontsize=18)
plt.grid(ls='--')
plt.twinx()
plt.bar(dc_reconstruction.time.values, dc_reconstruction.nobs.values, color='grey', alpha=0.3, zorder=1)
plt.ylabel('nobs(t)', color='grey', fontweight='bold', fontsize=20)
plt.ylim(0, 10000)
plt.yticks(fontsize=18)
plt.title('RMSE-based score', fontweight='bold', fontsize=20)
plt.savefig('rmse_t.png')
Compute PSD-based score
In [17]:
with ProgressBar():

    err = (dc_reconstruction_interp.sossheig - dc_ref_sample.sossheig)
    err = err.chunk({"lat":1, 'time': err.time.size, 'lon': err.lon.size})
    # make time vector in days units (can be nicer !!!!!)
    err['time'] = np.arange(0, err['time'].values.size, 1.)#/24.
    
    signal = dc_ref_sample.sossheig.chunk({"lat":1, 'time': dc_ref_sample.sossheig.time.size, 'lon': dc_ref_sample.sossheig.lon.size})
    # make time vector in days units
    signal['time'] = np.arange(0, signal['time'].values.size, 1.)#/24.
    
    
    psd_err = xrft.power_spectrum(err, dim=['time', 'lon'], detrend='linear', window=True).compute()
    psd_signal = xrft.power_spectrum(signal, dim=['time', 'lon'], detrend='linear', window=True).compute()
    
[########################################] | 100% Completed | 39.1s
[########################################] | 100% Completed | 40.2s
In [18]:
mean_psd_signal = psd_signal.mean(dim='lat').where((psd_signal.freq_lon > 0.) & (psd_signal.freq_time > 0), drop=True)
mean_psd_err = psd_err.mean(dim='lat').where((psd_err.freq_lon > 0.) & (psd_err.freq_time > 0), drop=True)
In [27]:
plt.figure(figsize=(8, 6))
ax = plt.gca()
ax.invert_yaxis()
ax.invert_xaxis()
c1 = plt.contourf(1./(mean_psd_signal.freq_lon), 1./mean_psd_signal.freq_time, (1.0 - mean_psd_err/mean_psd_signal), levels=np.arange(0,1.1, 0.1), cmap='RdYlGn', extend='both')
cbar = plt.colorbar(pad=0.01)
plt.xlabel('wavenumber(degree_lon)', fontweight='bold', fontsize=20)
plt.ylabel('frequency (days)', fontweight='bold', fontsize=20)
#plt.xscale('log')
plt.yscale('log')
plt.grid(linestyle='--', lw=1, color='w')
plt.xticks(fontsize=18)
plt.yticks(fontsize=18)
plt.title('PSD-based score', fontweight='bold', fontsize=20)
for axis in [ax.xaxis, ax.yaxis]:
    axis.set_major_formatter(ScalarFormatter())
c2 = plt.contour(1./(mean_psd_signal.freq_lon), 1./mean_psd_signal.freq_time, (1.0 - mean_psd_err/mean_psd_signal), levels=[0.5], linewidths=2, colors='k')
cbar.add_lines(c2)

bbox_props = dict(boxstyle="round,pad=0.5", fc="w", ec="k", lw=2)
ax.annotate('Resolved scales',
            xy=(1.15, 0.8),
            xycoords='axes fraction',
            xytext=(1.15, 0.55),
            bbox=bbox_props,
            arrowprops=
                dict(facecolor='black', shrink=0.05),
                horizontalalignment='left',
                verticalalignment='center')

ax.annotate('UN-resolved scales',
            xy=(1.15, 0.2),
            xycoords='axes fraction',
            xytext=(1.15, 0.45),
            bbox=bbox_props,
            arrowprops=
                dict(facecolor='black', shrink=0.05),
                horizontalalignment='left',
                verticalalignment='center')
Out[27]:
Text(1.15, 0.45, 'UN-resolved scales')